Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap Sampling
نویسندگان
چکیده
Although publicly accessible databases containing speech documents. It requires a great deal of time and effort required to keep them up to date is often burdensome. In an effort to help identify speaker of speech if text is available, text-mining tools, from the machine learning discipline, it can be applied to help in this process also. Here, we describe and evaluate document classification algorithms i.e. a combo pack of text mining and classification. This task asked participants to design classifiers for identifying documents containing speech related information in the main literature, and evaluated them against one another. Expected systems utilizes a novel approach of k -nearest neighbour classification and compare its performance by taking different values of k. Keywords— Data Mining, Text Mining, Classification, K-nearest neighbour, KNN
منابع مشابه
Weighted K-Nearest Neighbor Classification Algorithm Based on Genetic Algorithm
K-Nearest Neighbor (KNN) is one of the most popular algorithms for data classification. Many researchers have found that the KNN algorithm accomplishes very good performance in their experiments on different datasets. The traditional KNN text classification algorithm has limitations: calculation complexity, the performance is solely dependent on the training set, and so on. To overcome these li...
متن کاملImpact of the Sakoe-Chiba Band on the DTW Time Series Distance Measure for kNN Classification
For classification of time series, the simple 1-nearest neighbor (1NN) classifier in combination with an elastic distance measure such as Dynamic Time Warping (DTW) distance is considered superior in terms of classification accuracy to many other more elaborate methods, including k-nearest neighbor (kNN) with neighborhood size k > 1. In this paper we revisit this apparently peculiar relationshi...
متن کاملA Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection
K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...
متن کاملارائه روشی برای استخراج کلمات کلیدی و وزندهی کلمات برای بهبود طبقهبندی متون فارسی
Due to ever-increasing information expansion and existing huge amount of unstructured documents, usage of keywords plays a very important role in information retrieval. Because of a manually-extraction of keywords faces various challenges, their automated extraction seems inevitable. In this research, it has been tried to use a thesaurus, (a structured word-net) to automatically extract them. A...
متن کاملA Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کامل